AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Neural Information Processing SystemsOct-2-2025, 07:21:53 GMT

Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation (Supplementary Material)

We compute the limb length ratios of upper to lower arm and leg (both for the left and right sides) as well as torso, for geometric distribution analysis. The joints and body parts of interest are defined in Fig. S1. All the results are reported under unscaled protocol. How does the choice of self-supervised learning technique impact accuracy? We can observe Adv ( Joint, V anilla and Online settings) improves accuracy upon Baseline by a large margin.

artificial intelligence, machine learning, pose estimation, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.49)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.43)

Neural Information Processing SystemsAug-17-2025, 01:13:40 GMT

b618c3210e934362ac261db280128c22-Paper.pdf

adaptation, artificial intelligence, machine learning, (13 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Neural Information Processing SystemsJan-22-2025, 04:36:19 GMT

Review for NeurIPS paper: Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

Weaknesses: Though the authors shadow many insights on why ISO performs well, I still have questions about the Shared Feature Extractor, SSL Head, FSL Head. As the SSL is from existing work and the main contribution is combination of SSL with FSL, answering the questions clearly is important. Which kind of feature, information is shared in the Shared Feature Extractor? How much will it divert when trained on new target data so that is causes the FLS head fail? What information is kept in the FSL head?

inference stage optimization, information, shared feature extractor, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.49)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.40)

arXiv.org Artificial IntelligenceOct-2-2024

Meta-TTT: A Meta-learning Minimax Framework For Test-Time Training

Tao, Chen, Shen, Li, Mondal, Soumik

Test-time domain adaptation is a challenging task that aims to adapt a pre-trained model to limited, unlabeled target data during inference. Current methods that rely on self-supervision and entropy minimization underperform when the self-supervised learning (SSL) task does not align well with the primary objective. Additionally, minimizing entropy can lead to suboptimal solutions when there is limited diversity within minibatches. This paper introduces a meta-learning minimax framework for test-time training on batch normalization (BN) layers, ensuring that the SSL task aligns with the primary task while addressing minibatch overfitting. We adopt a mixed-BN approach that interpolates current test batch statistics with the statistics from source domains and propose a stochastic domain synthesizing method to improve model generalization and robustness to domain shifts. Extensive experiments demonstrate that our method surpasses state-of-the-art techniques across various domain adaptation and generalization benchmarks, significantly enhancing the pre-trained model's robustness on unseen domains.

adaptation, domain adaptation, statistics, (16 more...)

2410.01709

Country:

Europe > Russia (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > Russia (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.73)

Nguyen, Kien X., Qiao, Fengchun, Peng, Xi

Adaptive Cascading Network for Continual Test-Time Adaptation

arXiv.org Artificial IntelligenceJul-16-2024

We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to the current distribution. In light of these challenges, we propose a cascading paradigm that simultaneously updates the feature extractor and classifier at test time, mitigating the mismatch between them and enabling long-term model adaptation. The pre-training of our model is structured within a meta-learning framework, thereby minimizing the interference between the main and self-supervised tasks and encouraging fast adaptation in the presence of limited unlabelled data. Additionally, we introduce innovative evaluation metrics, average accuracy and forward transfer, to effectively measure the model's adaptation capabilities in dynamic, real-world scenarios. Extensive experiments and ablation studies demonstrate the superiority of our approach in a range of tasks including image classification, text classification, and speech recognition.

adaptation, batch size, target domain, (15 more...)

2407.1224

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
North America > United States > Idaho > Ada County > Boise (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceApr-17-2024

When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery

Xie, Yiqun, Wang, Zhihao, Chen, Weiye, Li, Zhili, Jia, Xiaowei, Li, Yanhua, Wang, Ruichen, Chai, Kangyang, Li, Ruohan, Skakun, Sergii

Foundation models, i.e., very large deep learning models, have demonstrated impressive performances in various language and vision tasks that are otherwise difficult to reach using smaller-size models. The major success of GPT-type of language models is particularly exciting and raises expectations on the potential of foundation models in other domains including satellite remote sensing. In this context, great efforts have been made to build foundation models to test their capabilities in broader applications, and examples include Prithvi by NASA-IBM, Segment-Anything-Model, ViT, etc. This leads to an important question: Are foundation models always a suitable choice for different remote sensing tasks, and when or when not? This work aims to enhance the understanding of the status and suitability of foundation models for pixel-level classification using multispectral imagery at moderate resolution, through comparisons with traditional machine learning (ML) and regular-size deep learning models. Interestingly, the results reveal that in many scenarios traditional ML models still have similar or better performance compared to foundation models, especially for tasks where texture is less useful for classification. On the other hand, deep learning models did show more promising results for tasks where labels partially depend on texture (e.g., burn scar), while the difference in performance between foundation models and deep learning models is not obvious. The results conform with our analysis: The suitability of foundation models depend on the alignment between the self-supervised learning tasks and the real downstream tasks, and the typical masked autoencoder paradigm is not necessarily suitable for many remote sensing problems.

artificial intelligence, deep learning, machine learning, (15 more...)

2404.11797

Country:

North America > United States > Maryland (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-29-2024

Sound event localization and classification using WASN in Outdoor Environment

Zhang, Dongzhe, Chen, Jianfeng, Bai, Jisheng, Wang, Mou

Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single microphone array, making them susceptible to signal attenuation and environmental noise, which limits their monitoring range. Moreover, methods using multiple microphone arrays often focus solely on source localization, neglecting the aspect of sound event classification. In this paper, we propose a deep learning-based method that employs multiple features and attention mechanisms to estimate the location and class of sound source. We introduce a Soundmap feature to capture spatial information across multiple frequency bands. We also use the Gammatone filter to generate acoustic features more suitable for outdoor environments. Furthermore, we integrate attention mechanisms to learn channel-wise relationships and temporal dependencies within the acoustic features. To evaluate our proposed method, we conduct experiments using simulated datasets with different levels of noise and size of monitoring areas, as well as different arrays and source positions. The experimental results demonstrate the superiority of our proposed method over state-of-the-art methods in both sound event classification and sound source localization tasks. And we provide further analysis to explain the reasons for the observed errors.

array node, classification, event localization, (13 more...)

2403.2013

Country:

Asia > Singapore (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-12-2024

Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

Zhu, Pengfei, Wang, Qian, Wang, Yu, Li, Jialu, Hu, Qinghua

Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: https://github.com/q086/DyFSS.

node representation, representation, ssl task, (12 more...)

2401.06595

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > China > Tianjin Province > Tianjin (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(13 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)